Agentic Browser

Documentation

Back to Home
Home Projects Agentic Browser Browser Extension Extension Architecture

Extension Architecture

Table of Contents#

  1. Introduction

  2. Project Structure

  3. Core Components

  4. Architecture Overview

  5. Detailed Component Analysis

  6. Dependency Analysis

  7. Performance Considerations

  8. Security Considerations

  9. Cross-Browser Compatibility

  10. Troubleshooting Guide

  11. Conclusion

Introduction#

This document explains the Browser Extension Architecture built with the WXT framework. It focuses on the three main entry points:

  • Background script for extension-wide operations and cross-tab coordination

  • Content script for page-level automation and DOM interaction

  • Side panel UI for user interaction and agent orchestration

It documents extension configuration, manifest setup, messaging architecture, component relationships, lifecycle management, and integration patterns with browser APIs. Security, permissions, and performance optimization strategies are also covered.

Project Structure#

The extension is organized under the extension directory with WXT entrypoints and React-based UI components. Key areas:

  • Configuration: wxt.config.ts defines module usage, permissions, and host permissions

  • Background: background.ts handles messaging, tab management, and agent tool execution

  • Content: content.ts manages page-level automation and DOM interactions

  • Side Panel: React app mounted via shadow DOM with hooks for auth, tabs, and WebSocket

  • Utilities: websocket-client.ts, executeActions.ts, and shared parsing utilities

graph TB subgraph "Extension Root" CFG["wxt.config.ts"] PKG["package.json"] TS["tsconfig.json"] end subgraph "Entry Points" BG["background.ts"] CT["content.ts"] SP_IDX["sidepanel/index.tsx"] SP_MAIN["sidepanel/main.tsx"] APP["sidepanel/App.tsx"] end subgraph "Side Panel Hooks" AUTH["hooks/useAuth.ts"] TABS["hooks/useTabManagement.ts"] end subgraph "Utilities" WS["utils/websocket-client.ts"] EXE["utils/executeActions.ts"] end CFG --> BG CFG --> CT CFG --> SP_IDX SP_IDX --> APP APP --> AUTH APP --> TABS APP --> WS APP --> EXE PKG --> WS PKG --> EXE

Diagram sources

Section sources

Core Components#

  • Background Script: Central orchestrator for messaging, tab state, and agent tool execution. Handles message routing for agent actions, tab operations, and Gemini requests.

  • Content Script: Page-level automation that injects or removes visual overlays and performs DOM actions (click, type, scroll) via injected scripts.

  • Side Panel UI: React application mounted in a shadow DOM, providing user controls, authentication, tab management, and agent execution with WebSocket integration.

Key responsibilities:

  • Messaging: bidirectional communication between UI, background, and content scripts

  • Permissions: activeTab, tabs, storage, scripting, identity, sidePanel, webNavigation, webRequest, cookies, bookmarks, history, clipboard, notifications, contextMenus, downloads

  • Cross-origin: host_permissions for <all_urls>

Section sources

Architecture Overview#

The extension follows a layered architecture:

  • UI Layer: Side panel React app with hooks for auth and tab management

  • Control Layer: Background script managing messaging and cross-tab operations

  • Automation Layer: Content script performing DOM-level actions

  • Utility Layer: WebSocket client and action executor utilities

graph TB UI["Side Panel UI
React App"] --> BG["Background Script
Messaging Hub"] BG --> CT["Content Script
DOM Automation"] UI --> WS["WebSocket Client"] UI --> AUTH["Auth Hook"] UI --> TABS["Tab Management Hook"] BG --> UTIL_EXE["Action Executor"] UI --> UTIL_WS["WebSocket Client"] subgraph "Browser APIs" RT["runtime"] TABS_API["tabs"] ST["storage"] ID["identity"] SCR["scripting"] NAV["webNavigation"] REQ["webRequest"] end BG --> RT BG --> TABS_API BG --> ST BG --> ID BG --> SCR BG --> NAV BG --> REQ

Diagram sources

Detailed Component Analysis#

Background Script#

Responsibilities:

  • Message routing for agent tool execution, tab activation/deactivation, tab queries, action execution, Gemini requests, and generated agent runs

  • Tab tracking via browser.tabs listeners and storage updates

  • Dynamic imports for external libraries (e.g., Gemini SDK)

  • Injection of content scripts and inter-tab messaging

Key flows:

  • Message listener routes incoming runtime messages to handlers

  • Tab management updates local storage for UI consumption

  • Action execution injects content scripts and forwards actions to content script

sequenceDiagram participant UI as "Side Panel UI" participant BG as "Background Script" participant CT as "Content Script" participant TAB as "Tabs API" UI->>BG : "ACTIVATE_AI_FRAME" or "DEACTIVATE_AI_FRAME" BG->>TAB : "Query active tab" BG->>CT : "Inject/remove overlay (via scripting)" CT-->>BG : "Activation result" UI->>BG : "EXECUTE_ACTION" BG->>TAB : "Inject content script" BG->>CT : "Send PERFORM_ACTION" CT-->>BG : "Action result" BG-->>UI : "Response"

Diagram sources

Section sources

Content Script#

Responsibilities:

  • Optional creation/removal of visual AI frame overlays

  • DOM-level actions (click, type, scroll) via injected functions

  • Basic action parsing and execution helpers

Notes:

  • The current implementation focuses on DOM manipulation and does not actively listen for messages in the provided snippet

  • The commented code shows a previous approach to overlay injection and removal

Section sources

Side Panel UI#

Responsibilities:

  • Mounts React app in a shadow DOM

  • Provides authentication flow (Google OAuth and demo GitHub login)

  • Manages active tab and tab list

  • Integrates WebSocket client for agent execution and statistics

  • Executes agent commands and browser actions

Key integrations:

  • Shadow DOM mounting via WXT content script API

  • Authentication hook for OAuth and token refresh

  • Tab management hook for active tab and tab list

  • WebSocket client for agent execution and progress updates

  • Action executor for browser-level actions

sequenceDiagram participant UI as "Side Panel UI" participant AUTH as "Auth Hook" participant TABS as "Tab Management Hook" participant WS as "WebSocket Client" participant BG as "Background Script" UI->>AUTH : "handleLogin()" AUTH->>AUTH : "Launch OAuth flow" AUTH-->>UI : "User data + tokens" UI->>TABS : "loadTabs()" TABS->>BG : "GET_ALL_TABS" BG-->>TABS : "Tab list" TABS-->>UI : "Active tab + tabs" UI->>WS : "executeAgent()" WS->>WS : "Emit execute_agent" WS-->>UI : "Progress + Result"

Diagram sources

Section sources

Messaging System Architecture#

The messaging system connects the UI, background, and content layers:

  • UI sends commands to background via runtime.sendMessage

  • Background routes messages to appropriate handlers

  • Background injects content scripts and communicates with content script via tabs.sendMessage

  • Content script executes DOM actions and returns results

flowchart TD UI["Side Panel UI"] --> BG_MSG["Background Message Listener"] BG_MSG --> BG_ROUTER{"Route by type"} BG_ROUTER --> |EXECUTE_AGENT_TOOL| BG_TOOL["handleExecuteAgentTool"] BG_ROUTER --> |ACTIVATE_AI_FRAME| BG_ACT["handleActivateAIFrame"] BG_ROUTER --> |DEACTIVATE_AI_FRAME| BG_DEACT["handleDeactivateAIFrame"] BG_ROUTER --> |GET_ACTIVE_TAB| BG_GET_ACTIVE["handleGetActiveTab"] BG_ROUTER --> |GET_ALL_TABS| BG_GET_ALL["handleGetAllTabs"] BG_ROUTER --> |EXECUTE_ACTION| BG_EXEC["handleExecuteAction"] BG_ROUTER --> |GEMINI_REQUEST| BG_GEM["handleGeminiRequest"] BG_ROUTER --> |RUN_GENERATED_AGENT| BG_RUN["handleRunGeneratedAgent"] BG_EXEC --> CT_INJ["Inject content script"] CT_INJ --> CT_MSG["tabs.sendMessage to content"] CT_MSG --> CT_PERF["performAction()"] CT_PERF --> BG_RESP["Return result to background"] BG_RESP --> UI_RESP["Return result to UI"]

Diagram sources

Section sources

Component Relationships#

  • Side Panel App depends on hooks for authentication and tab management

  • AgentExecutor integrates with WebSocket client and action executor

  • Background script coordinates messaging and tab operations

  • Content script provides DOM-level automation

classDiagram class App { +user +activeTab +apiKey +response +isSettingsOpen } class useAuth { +handleLogin() +handleGitHubLogin() +handleLogout() +getTokenAge() +getTokenExpiry() +handleManualRefresh() } class useTabManagement { +tabs +activeTab +loadTabs() } class AgentExecutor { +handleExecute() +handleStop() +progress +result } class WebSocketClient { +executeAgent() +stopAgent() +getStats() +isSocketConnected() } class executeActions { +executeBrowserActions() } App --> useAuth : "uses" App --> useTabManagement : "uses" App --> AgentExecutor : "renders" AgentExecutor --> WebSocketClient : "uses" AgentExecutor --> executeActions : "uses"

Diagram sources

Section sources

Dependency Analysis#

External dependencies include React, Socket.IO client, and Google Generative AI SDK. Internal dependencies are structured around hooks and utilities.

graph LR PKG["package.json"] --> REACT["react"] PKG --> SOCKET["socket.io-client"] PKG --> GAISDK["@google/generative-ai"] PKG --> RADIX["@radix-ui/react-select"] PKG --> MARKDOWN["react-markdown"] APP["App.tsx"] --> AUTH["useAuth.ts"] APP --> TABS["useTabManagement.ts"] APP --> WS["websocket-client.ts"] APP --> AE["AgentExecutor.tsx"] AE --> EXE["executeActions.ts"]

Diagram sources

Section sources

Performance Considerations#

  • Minimize DOM operations: batch DOM queries and mutations in content script

  • Debounce UI updates: throttle progress updates and tab list refreshes

  • Lazy loading: defer heavy computations until needed (e.g., Gemini SDK dynamic import)

  • Efficient messaging: avoid excessive message traffic; coalesce updates

  • Memory cleanup: remove event listeners and unmount React roots when appropriate

  • WebSocket reconnection: configure retry policies and backoff strategies

Security Considerations#

  • Permissions: carefully review and limit permissions to those required for functionality

  • Host permissions: <all_urls> grants broad access; ensure CSP and content security are enforced

  • OAuth: validate redirect URIs and handle errors gracefully; store tokens securely in browser storage

  • Content script isolation: avoid exposing sensitive data; sanitize inputs before DOM manipulation

  • Cross-origin requests: validate and sanitize external API responses; handle rate limits and errors

Cross-Browser Compatibility#

  • WXT supports multiple browsers; ensure browser-specific APIs are handled consistently

  • Use browser polyfills or feature detection for APIs not universally available

  • Test manifest keys and permissions across Chrome, Firefox, and Edge

  • Validate content script injection and messaging behavior differences

Troubleshooting Guide#

Common issues and resolutions:

  • Messaging timeouts: verify message listener registration and ensure async responses are sent

  • Content script injection failures: confirm scripting permissions and correct file paths

  • Tab operations failing: check tabs permissions and active tab queries

  • WebSocket connectivity: verify URL configuration and network availability

  • Authentication errors: validate OAuth flow and token refresh logic

Section sources

Conclusion#

The extension architecture leverages WXT’s entry points and React to deliver a cohesive browser automation experience. The background script centralizes messaging and coordination, the content script handles page-level automation, and the side panel UI provides user interaction and agent orchestration. Proper configuration, security hardening, and performance optimization are essential for robust cross-browser deployment.